Circumventing Data Quality Problems Using Multiple Join Paths

نویسندگان

  • Yannis Kotidis
  • Amélie Marian
  • Divesh Srivastava
چکیده

We propose the Multiple Join Path (MJP) framework for obtaining high quality information by linking fields across multiple databases, when the underlying databases have poor quality data, which are characterized by violations of integrity constraints like keys and functional dependencies within and across databases. MJP associates quality scores with candidate answers by first scoring individual data paths between a pair of field values taking into account data quality with respect to specified integrity constraints, and then agglomerating scores across multiple data paths that serve as corroborating evidences for a candidate answer. We address the problem of finding the top-few (highest quality) answers in the MJP framework using novel techniques, and demonstrate the utility of our techniques using real data and our Virtual Integration Prototype testbed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Bellman Data Quality Browser

Keynote Talk Abstract Data quality is a serious concern in complex industrial-scale databases, which often have thousands of tables and tens of thousands of columns. Commonly encountered problems include missing data (null values), duplicates and default values in columns supposed to treated as keys, data inconsistencies (violation of functional dependencies), and poor quality join paths (lack ...

متن کامل

Using MOLP based procedures to solve DEA problems

Data envelopment analysis (DEA) is a technique used to evaluate the relative efficiency of comparable decision making units (DMUs) with multiple input-output. It computes a scalar measure of efficiency and discriminates between efficient and inefficient DMUs. It can also provide reference units for inefficient DMUs without consideration of the decision makers’ (DMs) preferences. In this paper, ...

متن کامل

Join Constraints

Many application domains involve constraints that, at a conceptual modeling level, apply to one or more schema paths, each of which involves one or more conceptual joins (where the same conceptual object plays roles in two relationships). Popular information modeling approaches typically provide only weak support for such join constraints. This paper contrasts how join constraints are catered f...

متن کامل

Constraints on Conceptual Join Paths

To ensure that a software system accurately reflects the business domain that it models, the system needs to enforce the business rules (constraints and derivation rules) that apply to that domain. From a conceptual modeling perspective, many application domains involve constraints over one or more conceptual schema paths that include one or more conceptual joins (where the same conceptual obje...

متن کامل

Multicast State Distribution by Joins Using Multiple Shortest Paths

The lack of resources in routers will become a crucial issue with the deployment of state storing protocols. In particular, single or any source multicast protocols will most probably take over large amounts of resources for maintaining multicast tree information. The aim of this paper is to study the possibility and benefit of using multiple shortest paths in order for a new member to reach a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006